AITopics | highest performance

Collaborating Authors

highest performance

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

74de5f915765ea59816e770a8e686f38-Supplemental.pdf

Neural Information Processing SystemsFeb-8-2026, 23:14:15 GMT

dataset, primary task, weighting function, (15 more...)

Neural Information Processing Systems

Country: North America > Canada (0.05)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.31)

Add feedback

74de5f915765ea59816e770a8e686f38-Supplemental.pdf

Neural Information Processing SystemsOct-3-2025, 06:24:14 GMT

dataset, primary task, weighting function, (15 more...)

Neural Information Processing Systems

Country: North America > Canada (0.05)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.31)

Add feedback

TabStruct: Measuring Structural Fidelity of Tabular Data

Jiang, Xiangjian, Simidjievski, Nikola, Jamnik, Mateja

arXiv.org Artificial IntelligenceSep-16-2025

Evaluating tabular generators remains a challenging problem, as the unique causal structural prior of heterogeneous tabular data does not lend itself to intuitive human inspection. Recent work has introduced structural fidelity as a tabular-specific evaluation dimension to assess whether synthetic data complies with the causal structures of real data. However, existing benchmarks often neglect the interplay between structural fidelity and conventional evaluation dimensions, thus failing to provide a holistic understanding of model performance. Moreover, they are typically limited to toy datasets, as quantifying existing structural fidelity metrics requires access to ground-truth causal structures, which are rarely available for real-world datasets. In this paper, we propose a novel evaluation framework that jointly considers structural fidelity and conventional evaluation dimensions. We introduce a new evaluation metric, $\textbf{global utility}$, which enables the assessment of structural fidelity even in the absence of ground-truth causal structures. In addition, we present $\textbf{TabStruct}$, a comprehensive evaluation benchmark offering large-scale quantitative analysis on 13 tabular generators from nine distinct categories, across 29 datasets. Our results demonstrate that global utility provides a task-independent, domain-agnostic lens for tabular generator performance. We release the TabStruct benchmark suite, including all datasets, evaluation pipelines, and raw results. Code is available at https://github.com/SilenceX12138/TabStruct.

data mining, large language model, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2509.1195

Country:

North America > United States (0.45)
Europe > United Kingdom > England (0.27)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.92)

Industry:

Health & Medicine (1.00)
Information Technology > Security & Privacy (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
(5 more...)

Add feedback

MyCulture: Exploring Malaysia's Diverse Culture under Low-Resource Language Constraints

Hew, Zhong Ken, Low, Jia Xin, Yang, Sze Jue, Chan, Chee Seng

arXiv.org Artificial IntelligenceAug-11-2025

Large Language Models (LLMs) often exhibit cultural biases due to training data dominated by high-resource languages like English and Chinese. This poses challenges for accurately representing and evaluating diverse cultural contexts, particularly in low-resource language settings. To address this, we introduce MyCulture, a benchmark designed to comprehensively evaluate LLMs on Malaysian culture across six pillars: arts, attire, customs, entertainment, food, and religion presented in Bahasa Melayu. Unlike conventional benchmarks, MyCulture employs a novel open-ended multiple-choice question format without predefined options, thereby reducing guessing and mitigating format bias. We provide a theoretical justification for the effectiveness of this open-ended structure in improving both fairness and discriminative power. Furthermore, we analyze structural bias by comparing model performance on structured versus free-form outputs, and assess language bias through multilingual prompt variations. Our evaluation across a range of regional and international LLMs reveals significant disparities in cultural comprehension, highlighting the urgent need for culturally grounded and linguistically inclusive benchmarks in the development and assessment of LLMs.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2508.05429

Country: Asia > Malaysia (0.54)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.73)

Add feedback

Toward Automated Qualitative Analysis: Leveraging Large Language Models for Tutoring Dialogue Evaluation

Gu, Megan, Zhao, Chloe Qianhui, Liu, Claire, Patel, Nikhil, Shah, Jahnvi, Lin, Jionghao, Koedinger, Kenneth R.

arXiv.org Artificial IntelligenceApr-22-2025

Our study introduces an automated system leveraging large language models (LLMs) to assess the effectiveness of five key tutoring strategies: 1. giving effective praise, 2. reacting to errors, 3. determining what students know, 4. helping students manage inequity, and 5. responding to negative self-talk. Using a public dataset from the Teacher-Student Chatroom Corpus, our system classifies each tutoring strategy as either being employed as desired or undesired. Our study utilizes GPT-3.5 with few-shot prompting to assess the use of these strategies and analyze tutoring dialogues. The results show that for the five tutoring strategies, True Negative Rates (TNR) range from 0.655 to 0.738, and Recall ranges from 0.327 to 0.432, indicating that the model is effective at excluding incorrect classifications but struggles to consistently identify the correct strategy. The strategy \textit{helping students manage inequity} showed the highest performance with a TNR of 0.738 and Recall of 0.432. The study highlights the potential of LLMs in tutoring strategy analysis and outlines directions for future improvements, including incorporating more advanced models for more nuanced feedback.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2504.13882

Country:

Europe (0.29)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.18)

Genre: Research Report > New Finding (0.35)

Industry: Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.72)

Add feedback

MANTRA: The Manifold Triangulations Assemblage

Ballester, Rubén, Röell, Ernst, Schmid, Daniel Bin, Alain, Mathieu, Escalera, Sergio, Casacuberta, Carles, Rieck, Bastian

arXiv.org Artificial IntelligenceOct-3-2024

The rising interest in leveraging higher-order interactions present in complex systems has led to a surge in more expressive models exploiting high-order structures in the data, especially in topological deep learning (TDL), which designs neural networks on highorder domains such as simplicial complexes. However, progress in this field is hindered by the scarcity of datasets for benchmarking these architectures. To address this gap, we introduce MANTRA, the first large-scale, diverse, and intrinsically high-order dataset for benchmarking high-order models, comprising over 43,000 and 249,000 triangulations of surfaces and three-dimensional manifolds, respectively. With MANTRA, we assess several graph-and simplicial complex-based models on three topological classification tasks. We demonstrate that while simplicial complex-based neural networks generally outperform their graph-based counterparts in capturing simple topological invariants, they also struggle, suggesting a rethink of TDL. Thus, MANTRA serves as a benchmark for assessing and advancing topological methods, leading the way for more effective high-order models. Success in machine learning is commonly measured by a model's ability to solve tasks on benchmark datasets. While researchers typically devote a large amount of time to build their models, less time is devoted to data and its curation. As a consequence, graph learning is facing some issues in terms of reproducibility and wrong assumptions, which serve as obstructions to progress. An example of this was recently observed while analyzing long-range features: additional hyperparameter tuning resolves performance differences between message-passing (MP) graph neural networks on one side and graph transformers on the other (Tönshoff et al., 2023). In a similar vein, earlier work pointed out the relevance of strong baselines, highlighting the fact that structural information is not exploited equally by all models (Errica et al., 2020). Recently, new analyses even showed that for some benchmark datasets, as well as their associated tasks, graph information may be detrimental for the overall predictive performance (Bechler-Speicher et al., 2024).

dataset, homeomorphism type, triangulation, (15 more...)

arXiv.org Artificial Intelligence

2410.02392

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
Europe > United Kingdom > England > Greater London > London (0.04)
(4 more...)

Genre: Research Report > New Finding (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback

RouterRetriever: Exploring the Benefits of Routing over Multiple Expert Embedding Models

Lee, Hyunji, Soldaini, Luca, Cohan, Arman, Seo, Minjoon, Lo, Kyle

arXiv.org Artificial IntelligenceSep-4-2024

Information retrieval methods often rely on a single embedding model trained on large, general-domain datasets like MSMARCO. While this approach can produce a retriever with reasonable overall performance, models trained on domain-specific data often yield better results within their respective domains. While prior work in information retrieval has tackled this through multi-task training, the topic of combining multiple domain-specific expert retrievers remains unexplored, despite its popularity in language model generation. In this work, we introduce RouterRetriever, a retrieval model that leverages multiple domain-specific experts along with a routing mechanism to select the most appropriate expert for each query. It is lightweight and allows easy addition or removal of experts without additional training. Evaluation on the BEIR benchmark demonstrates that RouterRetriever outperforms both MSMARCO-trained (+2.1 absolute nDCG@10) and multi-task trained (+3.2) models. This is achieved by employing our routing mechanism, which surpasses other routing techniques (+1.8 on average) commonly used in language modeling. Furthermore, the benefit generalizes well to other datasets, even in the absence of a specific expert on the dataset. To our knowledge, RouterRetriever is the first work to demonstrate the advantages of using multiple domain-specific expert embedding models with effective routing over a single, general-purpose embedding model in retrieval tasks.

dataset, etriever, training dataset, (13 more...)

arXiv.org Artificial Intelligence

2409.02685

Genre: Research Report > New Finding (0.93)

Technology: Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.89)

Add feedback

Disease Classification and Impact of Pretrained Deep Convolution Neural Networks on Diverse Medical Imaging Datasets across Imaging Modalities

Borah, Jutika, Sarmah, Kumaresh, Singh, Hidam Kumarjit

arXiv.org Artificial IntelligenceSep-2-2024

Imaging techniques such as Chest X-rays, whole slide images, and optical coherence tomography serve as the initial screening and detection for a wide variety of medical pulmonary and ophthalmic conditions respectively. This paper investigates the intricacies of using pretrained deep convolutional neural networks with transfer learning across diverse medical imaging datasets with varying modalities for binary and multiclass classification. We conducted a comprehensive performance analysis with ten network architectures and model families each with pretraining and random initialization. Our finding showed that the use of pretrained models as fixed feature extractors yields poor performance irrespective of the datasets. Contrary, histopathology microscopy whole slide images have better performance. It is also found that deeper and more complex architectures did not necessarily result in the best performance. This observation implies that the improvements in ImageNet are not parallel to the medical imaging tasks. Within a medical domain, the performance of the network architectures varies within model families with shifts in datasets. This indicates that the performance of models within a specific modality may not be conclusive for another modality within the same domain. This study provides a deeper understanding of the applications of deep learning techniques in medical imaging and highlights the impact of pretrained networks across different medical imaging datasets under five different experimental settings.

dataset, highest performance, random initialization, (15 more...)

arXiv.org Artificial Intelligence

2408.17011

Country:

Asia > China > Guangdong Province > Guangzhou (0.04)
North America > United States > Florida > Miami-Dade County > Miami (0.04)
Asia > India (0.04)
Africa (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Offline Multi-Objective Optimization

Xue, Ke, Tan, Rong-Xi, Huang, Xiaobin, Qian, Chao

arXiv.org Artificial IntelligenceJun-5-2024

Offline optimization aims to maximize a black-box objective function with a static dataset and has wide applications. In addition to the objective function being black-box and expensive to evaluate, numerous complex real-world problems entail optimizing multiple conflicting objectives, i.e., multi-objective optimization (MOO). Nevertheless, offline MOO has not progressed as much as offline single-objective optimization (SOO), mainly due to the lack of benchmarks like Design-Bench for SOO. To bridge this gap, we propose a first benchmark for offline MOO, covering a range of problems from synthetic to real-world tasks. This benchmark provides tasks, datasets, and open-source examples, which can serve as a foundation for method comparisons and advancements in offline MOO. Furthermore, we analyze how the current related methods can be adapted to offline MOO from four fundamental perspectives, including data, model architecture, learning algorithm, and search algorithm. Empirical results show improvements over the best value of the training set, demonstrating the effectiveness of offline MOO methods. As no particular method stands out significantly, there is still an open challenge in further enhancing the effectiveness of offline MOO. We finally discuss future challenges for offline MOO, with the hope of shedding some light on this emerging field. Our code is available at \url{https://github.com/lamda-bbo/offline-moo}.

algorithm, optimization, search space, (15 more...)

arXiv.org Artificial Intelligence

2406.03722

Country:

Europe > Austria > Vienna (0.14)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
(19 more...)

Genre: Research Report > New Finding (0.34)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Transportation (0.86)
Materials (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

SLJP: Semantic Extraction based Legal Judgment Prediction

Madambakam, Prameela, Rajmohan, Shathanaa, Sharma, Himangshu, Gupta, Tummepalli Anka Chandrahas Purushotham

arXiv.org Artificial IntelligenceDec-13-2023

Legal Judgment Prediction (LJP) is a judicial assistance system that recommends the legal components such as applicable statues, prison term and penalty term by analyzing the given input case document. Indian legal system is in the need of technical assistance such as artificial intelligence to solve the crores of pending cases in various courts for years and its being increased day to day. Most of the existing Indian models did not adequately concentrate on the semantics embedded in the fact description (FD) that impacts the decision. The proposed semantic extraction based LJP (SLJP) model provides the advantages of pretrained transformers for complex unstructured legal case document understanding and to generate embeddings. The model draws the in-depth semantics of the given FD at multiple levels i.e., chunk and case document level by following the divide and conquer approach. It creates the concise view of the given fact description using the extracted semantics as per the original court case document structure and predicts judgment using attention mechanism. We tested the model performance on two available Indian datasets Indian Legal Documents corpus (ILDC) and Indian Legal Statue Identification (ILSI) and got promising results. Also shown the highest performance and less performance degradation for increased epochs than base models on ILDC dataset.

dataset, ilsi dataset, sljp model, (15 more...)

arXiv.org Artificial Intelligence

2312.07979

Country:

Asia > India > Tamil Nadu > Chennai (0.04)
Asia > India > Andhra Pradesh (0.04)

Genre: Research Report (0.40)

Industry:

Law > Criminal Law (0.49)
Law > Litigation (0.49)
Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback